Introduction

One of the biggest events in last year is the presidential election. Out of many people’s surprise, Donald Trump beat Hilary Clinton and was elected as the new president. At the inauguration day of Donald Trump, numerous protests and demonstrations happened in many major cities since that day was important for both Trump’s supporters and opponents. However, despite the protests and demonstrations, Trump’s inaugural speech gained more supports. According to a poll launched by AOL.com, 50 percent of voters liked Trump’s speech. In fact, the new president’s inaugural speech is an important and effective indicator of president’s future policies and decisions. Therefore, in this report, we will briefly analyze each president’s inaugural speech, from George Washington to Donald Trump, and then try to answer the question: To which president’s inaugural speech is Donald Trump’s most similar. In the last, we will extend our research to analyze the difference between Donald Trump’s and Hilary Clinton’s nomination speech.

Preparation

Check and install needed packages. Load the libraries and functions.

packages.used=c("rvest", "tibble", "qdap", 
                "sentimentr", "gplots", "dplyr",
                "tm", "syuzhet", "factoextra", 
                "beeswarm", "scales", "RColorBrewer",
                "RANN", "tm", "topicmodels","wordcloud")

# check packages that need to be installed.
packages.needed=setdiff(packages.used, 
                        intersect(installed.packages()[,1], 
                                  packages.used))
# install additional packages
if(length(packages.needed)>0){
  install.packages(packages.needed, dependencies = TRUE)
}

# load packages
library("rvest")
library("tibble")
library("qdap")
library("sentimentr")
library("gplots")
library("dplyr")
library("tm")
library("syuzhet")
library("factoextra")
library("beeswarm")
library("scales")
library("RColorBrewer")
library("RANN")
library("tm")
library("topicmodels")
library("wordcloud")
library("tidytext")
library("plotly")
library("ggplot2")
library("qdap")
library("plotrix")

source("../lib/plotstacked.R")
source("../lib/speechFuncs.R")

This notebook was prepared with the following environmental settings.

print(R.version)
##                _                           
## platform       x86_64-apple-darwin13.4.0   
## arch           x86_64                      
## os             darwin13.4.0                
## system         x86_64, darwin13.4.0        
## status                                     
## major          3                           
## minor          3.1                         
## year           2016                        
## month          06                          
## day            21                          
## svn rev        70800                       
## language       R                           
## version.string R version 3.3.1 (2016-06-21)
## nickname       Bug in Your Hair

Data Harvest

Step 1: scrap speech URLs from http://www.presidency.ucsb.edu/.

Following the example of Jerid Francom, we used Selectorgadget to choose the links we would like to scrap. For this project, we selected all inaugural addresses of past presidents and nominal addresses of Hilary Clinton and Donald Trump.

### Inauguaral speeches
main.page <- read_html(x = "http://www.presidency.ucsb.edu/inaugurals.php")
# Get link URLs
# f.speechlinks is a function for extracting links from the list of speeches. 
inaug=f.speechlinks(main.page)
as.Date(inaug[,1], format="%B %e, %Y")
inaug=inaug[-nrow(inaug),] # remove the last line, irrelevant due to error.

#### Nomination speeches
main.page=read_html("http://www.presidency.ucsb.edu/nomination.php")
# Get link URLs
nomin <- f.speechlinks(main.page)

Step 2: Using speech metadata posted on http://www.presidency.ucsb.edu/.

inaug.list=read.csv("../data/inauglist.csv", stringsAsFactors = FALSE)
nomin.list=read.csv("../data/nominlist.csv",stringsAsFactors = FALSE)

We assemble all scrapped speeches into one list. Note here that we don’t have the full text yet, only the links to full text transcripts.

Step 3: scrap the texts of speeches from the speech URLs.

speech.list<-rbind(inaug.list,nomin.list)
speech.list$type=c(rep("inaug", nrow(inaug.list)),rep("nomin",nrow(nomin.list)))
speech.url=rbind(inaug,nomin)
speech.list=cbind(speech.list, speech.url)

Based on the list of speeches, I scrap the main text part of the transcript’s html page. For simple html pages of this kind, Selectorgadget is very convenient for identifying the html node that rvest can use to scrap its content.

# Loop over each row in speech.list
speech.list$fulltext=NA
for(i in seq(nrow(speech.list))) {
  text <- read_html(speech.list$urls[i]) %>% # load the page
    html_nodes(".displaytext") %>% # isloate the text
    html_text() # get the text
  speech.list$fulltext[i]=text
  # Create the file name
  filename <- paste0("../data/fulltext/", 
                     speech.list$type[i],
                     speech.list$File[i], "-", 
                     speech.list$Term[i], ".txt")
  sink(file = filename) %>% # open file to write 
  cat(text)  # write the file
  sink() # close the file
}

First Part: What Has Past president Said?

Here, we first conduct a preliminary research on what the most frequently mentioned words are in the presidential inaugural speech. In order to make the analysis more practical, stop words such as “I”, “am” and punctuation are removed. Then, the top 10 words are presented below.

step 1: Generate the corpus of all speeches, clean data and turn the corpus into matrix.

## Using Vcorpus to generate the corpus of all speeches. Using tm function to strip whitespace of the speeches, turn all the character into lower case, remove some very common words(stop words) in English like "I","am" and remove punctuation.
ff.source<-VectorSource(speech.list$fulltext)
ff.all<-VCorpus(ff.source)
ff.all<-tm_map(ff.all, stripWhitespace)
ff.all<-tm_map(ff.all, content_transformer(tolower))
ff.all<-tm_map(ff.all, removeWords, stopwords("english"))
ff.all<-tm_map(ff.all, removeWords, character(0))
ff.all<-tm_map(ff.all, removePunctuation)

tdm.all<-TermDocumentMatrix(ff.all)

tdm.tidy=tidy(tdm.all)

tdm.overall=summarise(group_by(tdm.tidy, term), sum(count))
ff.matrix<-as.matrix(tdm.all)

What are the most 10 ?

The most frequent word that presidents will mention in his speech is “will.” This can be explained easily since the president always talked about what kind of policies and ideology he would use during the following four-year term. What’s more, other words such as “must” and “can” are also mentioned frequently. These words (will, must and can) are usually called modal verbs, which means that they are used to indicate modality. More preciously, these words can express a feeling of necessity and possibility to the listeners, and this is exactly what a new president want to deliver. By using these modal verbs, a new president can not only build his authority, but also draw a possible beautiful picture to his supporters. Furthermore, as a world leader and the president of the United States, new presidents also love to use words such as “world”, “America”, “people” and“American.” The frequent usage of these words shows that a new president tend to demonstrate his leadership of both American people and the Free World.

term_frequency<-rowSums(ff.matrix)
term_frequency<-sort(term_frequency,decreasing = TRUE)[1:10]
plot_ly(x=names(term_frequency[order(term_frequency,decreasing = TRUE)]),y=sort(term_frequency,decreasing = TRUE),type="bar")

Inspect an overall wordcloud

wordcloud(tdm.overall$term, tdm.overall$`sum(count)`,
          scale=c(5,0.5),
          max.words=100,
          min.freq=1,
          random.order=FALSE,
          rot.per=0.3,
          use.r.layout=T,
          random.color=FALSE,
          colors=brewer.pal(9,"Blues"))
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : america could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : world could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : country could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : american could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : states could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : now could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : upon could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : years could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : president could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : shall could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : party could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : peace could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : power could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : work could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : americans could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : just could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : freedom could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : made could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : life could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : national could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : public could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : men could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : nations could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : never could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : future could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : good could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : right could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : many could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : say could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : without could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : believe could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : law could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : way could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : congress could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : business could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : well could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : administration could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : citizens could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : want could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : republican could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : much could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : children could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : home could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : tonight could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : best could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : see could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : better could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : today could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : state could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : constitution could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : hope could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : come could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : rights could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : history could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : help could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : justice could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : give could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : change could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : system could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : day could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : long could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : laws could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : policy could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : political could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : ever could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : take could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : union could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : applause could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : spirit could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : still could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : foreign could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : going could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : together could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : like could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : part could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : democratic could not be fit on page. It will not be plotted.
## Warning in wordcloud(tdm.overall$term, tdm.overall$`sum(count)`, scale =
## c(5, : economic could not be fit on page. It will not be plotted.

Which President will Donald Trump be Most Like?

Donald Trump brought numerous controversial topics to the United States. While having millions of supporters especially among “Middle America States,” Donald Trump also has countless opponents who dislike his new policies. In fact, due to his new policies and behaviors on his Twitter account, many people are interested in the new president’s personality. However, it is hard to analyze people’s personality since human-beings are complex. Therefore, we choose to do analysis on the emotions Donald Trumps expressed in his inaugural speech by using reliable statistical method. Furthermore, we will compare his emotion in the speech with other presidents’ and find that which presidential inaugural speech that Donald Trump’s will be most similar to.

According to several research reports, “Andrew Jackson”, “James K. Polk”, “Lyndon B. Johnson” and “Ronald Reagan” are elected as candidates that President Trump might be most similar to due to their similar experience, policies or claims. Therefore, the following analysis will be mainly focus on these five presidents and President Trump.

Data Processing — generate list of sentences

We will use sentences as units of analysis for this project, as sentences are natural language units for organizing thoughts and ideas. We assign an sequential id to each sentence in a speech (sent.id) and also calculated the number of words in each sentence as sentence length (word.count).

sentence.list=NULL
for(i in 1:nrow(speech.list[])){
  sentences=sent_detect(speech.list$fulltext[i],
                        endmarks = c("?", ".", "!", "|",";"))
  if(length(sentences)>0){
    emotions=get_nrc_sentiment(sentences)
    word.count=word_count(sentences)
    # colnames(emotions)=paste0("emo.", colnames(emotions))
    # in case the word counts are zeros?
    emotions=diag(1/(word.count+0.01))%*%as.matrix(emotions)
    sentence.list=rbind(sentence.list, 
                        cbind(speech.list[i,-ncol(speech.list)],
                              sentences=as.character(sentences), 
                              word.count,
                              emotions,
                              sent.id=1:length(sentences)
                              )
    )
  }
}

Some non-sentences exist in raw data due to erroneous extra end-of sentence marks.

sentence.list=
  sentence.list%>%
  filter(!is.na(word.count)) 

Data analysis — length of sentences

Before starting to compare Donald Trump’s emotion with other presidents, we first do some analysis on the length of inaugural speeches.

Overview of sentence length distribution of Inaugural speeches

This picture shows the sentence length of Inaugural speeches for each president. Each point in the plot represent the number of words in a single sentence in the presidential inaugural speech. If most points cluster at the left side, it means that the sentence length tends to short. However, if the points are distributed kind of evenly or cluster at the right side, it means that many the length of sentences is probably long.

The first interesting finding is that President Trump’s sentence length is the second shortest among all presidents, and only President Bush, also a Republican President, has shorter sentence than him. What’s more, presidents in the early era (18 & 19th century) such as Andrew Jackson and James K. Polk tend to have longer sentence length, compared with presidents such as Donald Trump, Lyndon B. Johnson and Ronald Reagan. In conclusion, according to this result, we may first expect that Donald Trump may be similar to Lyndon B. Johnson and Ronald Reagan first.

sentence.list.sel<-sentence.list%>%filter(Term=="1",type=="inaug")
sentence.list.sel$File=factor(sentence.list.sel$File)

sentence.list.sel$FileOrdered=reorder(sentence.list.sel$File, 
                                  sentence.list.sel$word.count, 
                                  mean, 
                                  order=T)
par(mar=c(4, 11, 2, 2))

beeswarm(word.count~FileOrdered, 
         data=sentence.list.sel,
         horizontal = TRUE,
         pch=16, col=alpha(brewer.pal(9, "Set1"), 0.6), 
         cex=0.55, cex.axis=0.8, cex.lab=0.8,
         spacing=5/nlevels(sentence.list.sel$FileOrdered),
         las=2, ylab="", xlab="Number of words in a sentence.",
         main="Inaugural Speeches")

Data analysis — sentiment analysis

Sentence length variation over the course of the speech, with emotions.

For each extracted sentence, we apply sentiment analysis using NRC sentiment lexion. “The NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). The annotations were manually done by crowdsourcing.”

Overview of clustering of emotions

In this figure we can find that “fear”,“anger”,“disgust”,“sadness” seems to cluster into one group and we can call it negative emotions. “Anticipation”,“surprise”,“joy”,“trust” seems to cluster into another group and we can call it positive emotions.

heatmap.2(cor(sentence.list%>%filter(type=="inaug")%>%select(anger:trust)), 
          scale = "none", 
          col = bluered(100), , margin=c(6, 6), key=F,
          trace = "none", density.info = "none")

Overview of Emotions in the Presidents’ Inaugural Speeches

The following chart provides the overall view of emotions in the presidents’ inaugural speeches. As we can see in this figure, most emotions in inaugural speeches are positive, such as trust, anticipation and joy.

overall.emo<-colMeans(sentence.list%>%filter(type=="inaug")%>%select(anger:trust)>0.01)
col.use=c("red2", "darkgoldenrod1", 
            "chartreuse3", "blueviolet",
            "darkgoldenrod2", "dodgerblue3", 
            "darkgoldenrod1", "darkgoldenrod1")
barplot(overall.emo[order(overall.emo)], las=2, col=col.use[order(overall.emo)], horiz=T, main="Inaugural Speeches")

Individual emotions—comparing Donald Trump with other three similiar presidents.

The following chart shows the distribution of emotions in the presidential speeches. If one kind of color appears frequently, then it means that the emotion corresponding to this kind of color is also pervasive.

From the following chart, we can see that the chart for Donald Trump is very colorful, which means that there are many emotions existing in his inaugural speech. What’s more, the emotion of anger in his speech happens more frequently, compared to in other presidential inaugural speeches. It is noteworthy that the emotion of Andrew Jackson is kind of monotonous due to the fact that the number of words in his speech is fewer than in other presidents’ speeches. However, based on the analysis, we expect that the emotion of Donald Trump is similar to Ronald Reagan, James K. Polk and Lyndon Johnson.

par(mfrow=c(3,2), mar=c(1,0,2,0), bty="n", xaxt="n", yaxt="n", font.main=1)

f.plotsent.len(In.list=sentence.list, InFile="DonaldJTrump", 
               InType="inaug", InTerm=1, President="Donald Trump")
# 
f.plotsent.len(In.list=sentence.list, InFile="AndrewJackson", 
               InType="inaug", InTerm=1, President="Andrew Jackson")
# 1829
f.plotsent.len(In.list=sentence.list, InFile="RonaldReagan", 
               InType="inaug", InTerm=1, President="Ronald Reagan")
# 2427
f.plotsent.len(In.list=sentence.list, InFile="JamesKPolk", 
               InType="inaug", InTerm=1, President="James K. Polk")
# 4809

f.plotsent.len(In.list=sentence.list, InFile="LyndonBJohnson", 
               InType="inaug", InTerm=1, President="Lyndon B. Johnson")

In the following chart, we quantify the scale of emotions and improve the visualization of the emotion chart. According to this chart, the components and the ratio of emotions are highly similar in James K. Polk’s , Lyndon B. Johnson’s, and Ronald Reagan’s speeches. However, in both Donald Trump’s and Andrew Jackson’s speeches, there is significant difference in emotions compared to the other three presidents. This finding does not follow our previous expectation so we need to do more analysis on it.

trump.emo<-colMeans(sentence.list%>%filter(President=="Donald J. Trump")%>%select(anger:trust)>0.01)
andrew.emo<-colMeans(sentence.list%>%filter(President=="Andrew Jackson")%>%select(anger:trust)>0.01)
ronald.emo<-colMeans(sentence.list%>%filter(President=="Ronald Reagan")%>%select(anger:trust)>0.01)
james.emo<-colMeans(sentence.list%>%filter(President=="James K. Polk")%>%select(anger:trust)>0.01)
lyndon.emo<-colMeans(sentence.list%>%filter(President=="Lyndon B. Johnson")%>%select(anger:trust)>0.01)

emotion<-data.frame(trump=trump.emo,andrew=andrew.emo,ronald=ronald.emo,james=james.emo,lyndon=lyndon.emo)
emotion<-t(emotion)
Presidents <- c("Donald J. Trump","Andrew Jackson","Ronald Reagan","James K. Polk","Lyndon B. Johnson")
data <- data.frame(Presidents,emotion)

p <- plot_ly(data, x = ~Presidents, y = ~joy, type = 'bar', name = 'joy') %>%
  add_trace(y = ~anticipation, name = 'anticipation') %>%
  add_trace(y = ~fear, name = 'fear') %>%
  add_trace(y = ~sadness, name = 'sadness') %>%
  add_trace(y = ~disgust, name = 'disgust') %>%
  layout(yaxis = list(title = 'Emotions'), barmode = 'stack')
p

Cluster presidents according to their emotions in inaugral speeches.

The following chart reveals the cluster of presidents’ emotions in their inaugural speeches by using KNN method. By setting different K values, we find that Andrew Anderson will always be clustered with Donald Trump. Since KNN method provides more reliable results than simple visualization, we will change our expectation and now our expectation is that the emotions in the President Trump’s speech are most similar to President Jackson’s.

presid.summary=tbl_df(sentence.list)%>%
  filter(type=="inaug")%>%
  #group_by(paste0(type, File))%>%
  group_by(File)%>%
  summarise(
    anger=mean(anger),
    anticipation=mean(anticipation),
    disgust=mean(disgust),
    fear=mean(fear),
    joy=mean(joy),
    sadness=mean(sadness),
    surprise=mean(surprise),
    trust=mean(trust)
    #negative=mean(negative),
    #positive=mean(positive)
  )
# always stay with andrew jackson, sometimes ronald always different with james,lb

presid.summary=as.data.frame(presid.summary)
rownames(presid.summary)=as.character((presid.summary[,1]))
km.res=kmeans(presid.summary[,-1], iter.max=200,
              5)
fviz_cluster(km.res, 
             stand=F, repel= TRUE,
             data = presid.summary[,-1], xlab="", xaxt="n",
             show.clust.cent=FALSE)

Now, we will extend the research topic to analyze the difference of nomination speeches between Donald Trump and Hilary Clinton. Hilary Clinton is a politician who is famous for her diplomatic experience, feminism thoughts and her husband. In fact, she was considered as the most likely candidate who would win the 2016 presidential election. However, the truth is Donald Trump won the election out of many people’s surprise. Therefore, it is interesting to examine what happened during the process of election, and the topic we will analyze on is the nomination speeches of Hilary Clinton’s and Donald Trump’s. Nomination speeches are used when the candidates accept the nomination for the presidency from his/her party. The first thing we do is that we list the first twenty-five words that most in common in both Trump’s and Hilary’s speeches. Then we use Word Cloud to compare these common used words in both speeches.

Difference between Donald Trump’s and Hilary Clinton’s Nomination Speeches

Common wordcloud and comparsion wordcloud

In this chart, we can see that both Donald Trump and Hilary Clinton mention words related to promises in the future such as “will”, “going” and “can.” Especially for Donald Trump, the frequency for him to use “will” is much more than that of Hilary Clinton. In order to analyze this closer, we make a Word Cloud for the comparison.

all.trump<-speech.list%>%filter(President=="Donald J. Trump", type=="nomin")%>%select(fulltext)
all.cliton<-speech.list%>%filter(President=="Hillary Clinton", type=="nomin")%>%select(fulltext)
all.speech<-c(all.trump,all.cliton)
all.speech<-VectorSource(all.speech)
ff.clean<-VCorpus(all.speech)
ff.clean<-tm_map(ff.clean, stripWhitespace)
ff.clean<-tm_map(ff.clean, content_transformer(tolower))
ff.clean<-tm_map(ff.clean, removeWords, stopwords("english"))
ff.clean<-tm_map(ff.clean, removeWords, character(0))
ff.clean<-tm_map(ff.clean, removePunctuation)
all_tdm<-TermDocumentMatrix(ff.clean)
colnames(all_tdm)<-c("Donald J. Trump","Hilary Clinton")
all_m<-as.matrix(all_tdm)
commonality.cloud(all_m,max.words = 100,colors="steelblue1")

common_words <- subset(all_m, all_m[, 1] > 0 & all_m[, 2] > 0)
difference<-abs(common_words[,1]-common_words[,2])
common_words<-cbind(common_words,difference)
common_words<-common_words[order(common_words[,3],decreasing = T),]
top25_df<-data.frame(x=common_words[1:25,1],
                     y=common_words[1:25,2],
                     labels=rownames(common_words[1:25,]))
library(plotrix)
pyramid.plot(top25_df$x,top25_df$y,labels=top25_df$labels,gap=8,top.labels=c("Donald J. Trump","Words","Hillary Clinton"),main="Words in Common",laxlab=NULL,raxlab=NULL,unit=NULL)

## [1] 5.1 4.1 4.1 2.1

The following graph is the Word Cloud Comparison between Donald Trump and Hilary Clinton. As mentioned before, Donald Trump frequently used “will” in his speech. What’s more, some thrilling words such as “terrorism” and “violence” appear more frequently in Donald Trump’s speech. It seems that Trump’s speech strategy was to heighten the listeners’ concerns of the potential dangers in the US. It is noteworthy that many words appear in the world cloud for Donald Trump is more detailed and policies-related. In other words, Donald Trump loved to directly mention his specific future plans in his nomination speech. However, it seems that Hilary loved using more general terms such as “people”, “rights” and “family” in her speech. There was not many detail explanation of her policies or plans in her speech.

comparison.cloud(all_m,colors = c("orange","blue"),max.words = 50)